## [1] "Mon Mar 25 15:27:38 2019"
library(readr)
library(data.table)
library(plotly, quietly=T)

1 Studies

Read file of all studies in AACT.

## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"

2 Interventional studies only

Select only Interventional study_type.

## [1] "Interventional studies: 237892 (79.2%)"
All interventional studies, by phase
phase N
Early Phase 1 2619
Phase 1 29795
Phase 1/Phase 2 10063
Phase 2 41637
Phase 2/Phase 3 4963
Phase 3 29662
Phase 4 25001
NA 94152

3 Drugs

Read file of all drugs in AACT. * id is AACT ID. * Note that one study may involve multiple drugs.

## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"

3.1 Studies: drug trials only

Select only studies involving drugs.

## [1] "Drug trials: 124421 ; unique NCT_IDs: 124421"

Merge study metadata with drugs.

All drugs, by study phase
phase N
Early Phase 1 2615
Phase 1 48593
Phase 1/Phase 2 13288
Phase 2 68850
Phase 2/Phase 3 6503
Phase 3 49507
Phase 4 36331
NA 29390

3.2 Drugs by study start_year

## Warning: Ignoring 1 observations

3.3 Drug-trials by Phase and Status

All drugs, by phase
phase N
Early Phase 1 2615
Phase 1 48593
Phase 1/Phase 2 13288
Phase 2 68850
Phase 2/Phase 3 6503
Phase 3 49507
Phase 4 36331
NA 29390

4 NextMove Leadmine NER

AACT drug names resolved to standard names and structures via SMILES.

## [1] "Drugs with resolved structure: 180555 / 197300 (91.5%)"
All drugs, by overall_status
overall_status N
Completed 114900
Recruiting 23262
Terminated 15384
Unknown status 15111
Active, not recruiting 10409
NA 5675
Not yet recruiting 5604
Withdrawn 5475
Enrolling by invitation 741
Suspended 739

4.1 NER mentions by intervention ID.

## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"

4.2 NER mentions by trial (NCT ID).

## [1] "Mentions by study: 92966 / 99647 (93.3%)"

4.3 NER mentions by drug, i.e. name in AACT.

## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"

5 PUBCHEM:

5.1 Intervention IDs to CIDs from PubChem (via SMILES)

## [1] "PubChem SMILES2CID hits: 3960 / 4698 (84.3%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153876"

5.2 InChIKeys from PubChem (via CIDs)

## [1] "PubChem CIDs with InChIKeys: 3801"

6 CHEMBL:

6.1 ChEMBL molecule IDs, and properties (via InChIKeys)

## [1] "ChEMBL compounds mapped via InChIKeys: 3332"

6.2 ChEMBL activities for mapped compounds

Select only activities with pChembl values for confidence.

## [1] "ChEMBL activities: 124438"
## [1] "ChEMBL activities molecules: 2287 ; targets: 3832 ; documents: 16198"

6.3 ChEMBL targets (via activities)

## [1] "ChEMBL target proteins: 3157"

7 IDG/TCRD:

## [1] "ChEMBL target proteins mapped to TCRD (human): 1806"

7.1 Targets by organism (top 10):

## [1] "Organisms: 187"
Targets by organism (top 10)
organism N_targets
Homo sapiens 1806
Rattus norvegicus 529
Mus musculus 238
Bos taurus 98
Sus scrofa 36
Cavia porcellus 26
Escherichia coli K-12 19
Oryctolagus cuniculus 18
Escherichia coli 17
Mycobacterium tuberculosis 17

7.2 Human single-protein targets only.

## [1] "Human targets: 1806"
target_type N
SINGLE PROTEIN 1216
PROTEIN COMPLEX 247
PROTEIN FAMILY 210
PROTEIN COMPLEX GROUP 91
PROTEIN-PROTEIN INTERACTION 16
SELECTIVITY GROUP 14
CHIMERIC PROTEIN 12
## [1] "Human single-protein targets: 1216 ; unique UniProts: 1216"

7.3 Targets by IDG Target Development Level (TDL):

## [1] "   Tchem:    733" "   Tclin:    341" "    Tbio:    140"
## [4] "   Tdark:      2"